Topological Machine Learning for Mixed Numeric and Categorical Data
نویسندگان
چکیده
Topological data analysis is a relatively new branch of machine learning that excels in studying high-dimensional data, and theoretically known to be robust against noise. Meanwhile, objects with mixed numeric categorical attributes are ubiquitous real-world applications. However, topological methods usually applied point cloud the best our knowledge there no available framework for classification using methods. In this paper, we propose novel method classification. proposed method, use theory from such as persistent homology, persistence diagrams Wasserstein distance study data. The performance demonstrated by experiments on heart disease dataset. Experimental results show outperforms several state-of-the-art algorithms prediction disease.
منابع مشابه
Clustering Large Data Sets with Mixed Numeric and Categorical Values
Efficient partitioning of large data sets into homogenous clusters is a fundamental problem in data mining. The standard hierarchical clustering methods provide no solution for this problem due to their computational inefficiency. The k-means based methods are promising for their efficiency in processing large data sets. However, their use is often limited to numeric data. In this paper we pres...
متن کاملClustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach
Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with mixed types of attributes are common in real life data mining applications. In this paper, we propose a novel divide-and-conquer techniq...
متن کاملAn improved k-prototypes clustering algorithm for mixed numeric and categorical data
Data objects with mixed numeric and categorical attributes are commonly encountered in real world. The k-prototypes algorithm is one of the principal algorithms for clustering this type of data objects. In this paper, we propose an improved k-prototypes algorithm to cluster mixed data. In our method, we first introduce the concept of the distribution centroid for representing the prototype of c...
متن کاملClustering Algorithm for Incomplete Data Sets with Mixed Numeric and Categorical Attributes
The traditional k-prototypes algorithm is well versed in clustering data with mixed numeric and categorical attributes, while it is limited to complete data. In order to handle incomplete data set with missing values, an improved k-prototypes algorithm is proposed in this paper, which employs a new dissimilarity measure for incomplete data set with mixed numeric and categorical attributes and a...
متن کاملA k-mean clustering algorithm for mixed numeric and categorical data
Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Artificial Intelligence Tools
سال: 2021
ISSN: ['1793-6349', '0218-2130']
DOI: https://doi.org/10.1142/s0218213021500251